Liu-type pretest and shrinkage estimation for the conditional autoregressive model

Spatial regression models have recently received a lot of attention in a variety of fields to address the spatial autocorrelation effect. One important class of spatial models is the Conditional Autoregressive (CA). Theses models have been widely used to analyze spatial data in various areas, as geography, epidemiology, disease surveillance, civilian planning, mapping of poorness signals and others. In this article, we propose the Liu-type pretest, shrinkage and positive shrinkages estimators for the large-scale effect parameter vector of the CA regression model. The set of the proposed estimators are evaluated analytically via their asymptotic bias, quadratic bias, the asymptotic quadratic risks, and numerically via their relative mean squared errors. Our results demonstrate that the proposed estimators are more efficient than Liu-type estimator. To conclude this paper, we apply the proposed estimators to the Boston housing prices data, and applied a bootstrapping technique to evaluate the estimators based on their mean squared prediction error.


Introduction
Data collected across geographical areas may show some dependencies in which closer observations are more similar than those farther apart. This behaviour can be modeled by incorporating a covariance structure into the traditional statistical models. One of these models is the spatial regression model, which assimilate different types of special dependencies. Applications of the spatial regression models have been growing up in different fields as ecology, epidemiology, disease mapping, public health, psychology, and others.
In the context of time series, Autoregressive models represent the error terms at time (t) as a linear function of the recent inherent errors. Similarly, autoregressive models in spatial framework model the data from a specific location, known as site, as a function of data from nearby locations, where a site is a physical location where the data is collected, and the conception of neighborhood between two sites is defined based on a specific distance or closeness metric. One important class of Spatial regression models is the Conditional Autoregressive model. The CA name is due to the possibility of writing the mean and the variance using conditional expectation form. The CA model has recently been extensively applied in a vast range of different areas. For example, but not limited to, Shen X. et.al [1] proposed CA model to analyze the heterogenous genetic effects among individuals which is considered as a random effect a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 in their model. Pérez-Molina [2] modeled hierarchical relationships using multilevel models with random intercepts and a CA component to account for spatial effects. He demonstrated that such models are significantly improve housing price modeling. Tharmin S.A et.al [3] used Bayesian CA in mapping the relative risk of the spread of dengue fever disease in Makassar, Indonesia. They demonstrated that Makassar is still vulnerable to dengue fever. Qiang Z. et.al [4] proposed a Bayesian bivariate CA model to establish the links between crash frequencies and traffic attributes. Dibakar S. et.al [5] investigated the relationship between bicycle crash frequency and the factors that contribute to them at the census block group level in the state of Florida, USA, using the class of CA models within the hierarchical Bayesian framework. Ver Heof J.M. et.al [6] discussed six different types of practical ecological inferences that can be made using the CA and SA models. They compared the CA and simultaneous autoregressive (SA) models and demonstrated their evolution as well as their connection to partial correlations. Wang C. et.al [7] used spatial Poisson-lognormal with CA priors to investigate the impact of traffic congestion of road accidents. Kleinschmidt I. et.al [8] explored the spatial and temporal variation in small-area malaria incidence rates using CA models. Gelfand, A. E., and Vounatsou, P. [9] used multivariate CA models for the analysis of spatial data and there models to study the child growth and the spatial variation in HLA-B allele frequencies.
In classical Statistical inference, we use the sample data information (subjective information) to make inference about the unknown parameter(s). In Bayesian framework, we combine the non-sample information, known as Uncertain Prior Information(UPI) and the sample information to make the inference. The UPI can be obtained from different resources, for example, historical information about the parameter(s), or applying some selection methods used in regression analysis. In many cases, researchers have previous knowledge about some of the regression variables that will be used in their regression model, or may formulate a linear hypothesis of the form H 0 : Hβ = h, where β is a (p × 1) vector of regression coefficients, H is (p 2 × p) known matrix of rank (p 2 � p), and h is a (p 2 × 1) fixed vector of constants in R p 2 . This restriction is a commonly used method in regular regression, experimental design, machine learning and other fields to produce a restricted model that can perform at least as well as the full model with all available predictors. It can also be considered as a variable selection technique in which the reduced model will be tested to investigate the importance of some variables in explanting the variation in the response variable, and to decide how really the model is useful in the prediction process. In our case, we use this hypothesis to produce a sub model with less number of predictors. Theoretically we assume such restriction to study the performance of the reduced model compared with the full one, and numerically, we force some of the coefficients to be zeros (not significant) to confirm our analytical results. In real life problems we can gain some knowledge about the important variables, eliminate redundancy of some variables, and figure out the multicolinearity issue using different techniques, as the AIC, BIC, best subset, penalization algorithms, and others. The correlation matrix among all variables including the responses will also a helpful tool to justify our restriction.
One of the oldest methods that combines the sample and the UPI is the pretest estimation. The pretest estimator combines the sample data model, known as full model, and the UPI model, known as the submodel, into the estimation process using a binary weights, and chooses the submodel estimator if the test statistics rejects the null hypothesis (H 0 ) at a specific level of significance α, and the full model estimator otherwise. Later on, a new estimator that uses a smooth function of the test statistics is the shrinkage estimator. Further, an improved version of the shrinkage estimator, known as the positive shrinkage estimator was proposed. The three estimators have been discussed a lot in the literature under different settings. Al-Momani M. et.al [10] proposed the pretest, shrinkage and positive shrinkage estimators for the vector of regression coefficients of the marginal model with multinomial response, and showed the superiority of the positive shrinkage estimator over the classical generalized estimating equation (GEE). Al-Momani, M. and Dawod B.A. [11] used the idea of pretest, and the shrinkage estimation for the Autoregressive Conditionally Heteroscedastic (ARCH) model. They discovered that the positive shrinkage estimator outperformed the restricted, pretest, and shrinkage estimators regardless of the accuracy of the restriction provided by the linear hypothesis to check whether some of the coefficients of the ARCH model's parameter vector are not significant is true or not. Li, Y and Jin, B [12] investigated the sparsity and homogeneity of regression coefficients using prior constraint information in their work and showed combining prior knowledge can increase the effectiveness of both sparsity and homogeneity identification. Arumairajan, S. [13] proposed a stochastic restricted Liu estimator that is almost unbiased by combining modified nearly unbiased Liu estimator and mixed estimator when multicollinearity is present and stochastic restrictions are available. He showed that it outperformed the ordinary least squares, mixed estimator, ridge estimator, and other estimators considered in his study in terms of mean squared error sense. Ridge regression theory and an important shrinkage and model selection techniques with application to machine learning has been studied extensively for different models and settings by Saleh, A. K. et al [14,15]. For more details about the shrinkage estimators, the reader is referred to S.E. Ahmed [16], Nkurunziza, S. et. al [17], Peng, L. et.al [18], and Saleh, A. K. [19], among others.
One common problem that researchers faced while fitting a multiple regression model using the ordinary least squares(OLS) method is the mulicolinearity, which occurs when some of the explanatory variables are correlated. This problem may cause insignificant regression coefficients or some of the coefficients have unexpected signs. There are many estimation methods proposed to improve the OLS estimators. For instance, Hoerl and Kennard [20] proposed the ridge estimate for the OLS estimator. Liu K. [21] introduced a biased estimate in linear regression. A modified version of Liu estimator was proposed by Li and Yang [22]. Yüzbaşı, B. et.al [23] proposed the pretest and shrinkage-type ridge regression estimators in case of linear models. Recently Yüzbaşı, B. et.al [24] proposed the pretest, shrinkage, and pretest-shrinkage Liu-type estimation in linear models. Babar, Iqra et.al [25] proposed new estimators for the shrinkage parameter of Liu estimator based on quantile of the regression coefficients. They showed the new estimator outperformed the existing estimators in terms of mean squared error and absolute error. Arashi M. et al [26] proposed an improved Liu-type unrestricted, restricted, pretest, shrinkage, and positive shrinkage estimators for the regression parameter vector of coefficients. They showed the superiority of the proposed method analytically and numerically. With respect to robust regression, Arashi, M. et al [27] defined the Liutype rank-based estimators. They examined the asymptotic behavior of the estimators, and provided the proposed estimators' superiority requirements for the biasing parameters, and supported their findings by numerical calculations. Arashi, M. et al [28] proposed the ridge estimator for high-dimensional multicollinear data. They proved the consistency and derived some asymptotic properties of the proposed estimators and applied it to simulation experiments and real data set. Arashi, M. et al [29] proposed a re-scaled LASSO for multicollinear situations. Their numerical analysis has demonstrated that the scaled LASSO performs frequently better than the LASSO and elastic net while being comparable to other sparse modeling techniques. Arashi, M et al [30] developed an improved ridge approach for the genome regression modeling and used a rank ridge estimator for parameter estimation and prediction when multicollinearity presents with outliers in the data set.
In this manuscript, we aim to propose efficient estimators for the large-scale effect parameter vector (β) in the CA model when it is suspected that some of the coefficients are not significant. So, we partition the (p × 1) parameter vector β as (β 1 , β 2 ), where β 1 is a (p 1 × 1) vector, which is considered as the coefficients of the main effect, β 2 is a (p 2 × 1) vector as the unimportant or nuisance parameters, and p 1 + p 2 = p. We are primarily interested in estimating β 1 when β 2 is suspected to be zero or close to zero. In some cases, the full model estimator may be highly variable and difficult to interpret, and the submodel estimator may result in a large biased and under-fitted estimator. To overcome this issue, we considered the Liu-type pretest, shrinkage and positive shrinkage estimators.
The rest of the paper is organized as follows in accordance with our goals. Section 2 provides a brief overview of the CA model. The maximum likelihood estimator of the CA model parameters are given in Section 3. In Section 4, we proposed the Liu-type estimators, and discussed the asymptotic properties in terms of bias, quadratic bias, and quadratic risks in Section 5. We compared the array of estimators using Monte Carlo simulation and real data example in Section 6. Some conclusions are given in Section 7.

Conditional autoregressive model
Assume, in accordance with Cressie and Wikle [31], that there are (n) spatial cites (usually referred as locations, geographical areas, etc). The collection of theses cites is known as a lattice indicated by the notation S = {s 1 , s 2 , . . ., s n }. For the i th cite s i , a set of neighboring cites, denoted by N(s i ) is defined as N(s i ) = {s j : j is a neighbor of i}, j = 1, 2, . . ., n in which a neighborhood structurer is defined based on a certain metric. For example, two sites are rook-based neighbors in a regular lattices if they have common boundaries, while it is a queen-based neighbors if the two sites have common boundaries and corners. Let Y n (s) = {Y(s 1 ), Y(s 2 ), . . ., Y(s n ))} be a vector of observations that collected at sites {s 1 , s 2 , . . ., s n }, and X(s i ) = X i = (X 1i , X 2i , . . ., X pi ) 0 be the set of covarites, and β = (β 1 , β 2 , . . ., β p ) 0 is a p × 1 vector of parameters, known as the large-scale effect on Y n (s).
We will assume that Y n (s) is continuous, and follows a Gaussian process with mean μ(s) = E (Y n (s)) = X 0 (s)β and covariance matrix Var w ij , W* is called the standardized proximity matrix, and D is a diagonal matrix with d ii ¼ 1 w iþ . For simplicity, the covariate vectors for all sites will be consolidated into a design matrix X(s), and all subscripts (n, s) will be removed unless we need to present them explicitly. That is, the data on the lattice s will be denoted by (Y, X). Following Besag et al [32], the Conditional Autoregressive (CA) model follows a multivariate Gaussian (Normal) distribution as Y * N n (Xβ, V n ), where V n = σ 2 (I n −ρW*) −1 D. In regression context, the CA model is given by: where � * N n (0, σ 2 V n ). The model is known as a conditional autoregressive regression model because the mean and variance of Y(s i ) can be written in a conational form, as follows: w ij ðYðs j Þ À X 0 ðs j ÞβÞ;

The maximum likelihood estimation
The maximum likelihood estimators (MLEs) of the parameter vector β, σ 2 , and the spatial dependence parameter ρ are derived by a two-step profile-likelihood procedure, see Cressie [33]. We fix the parameter ρ at first, then solve the log-likelihood equation, and plugβðrÞ, s 2 ðrÞ back in the log-likelihood to find the MLE of ρ, which is denoted byr. The MLEs of β and σ 2 are given by:β Then, the MLE of ρ is a solution of the log-likelihood function that maximizes L*(ρ) see Ord [34], where Finally, we obtain the MLEs of β and σ 2 . We denote to the MLEs of (β, σ 2 , ρ) bŷ ϑ ¼ ðβ;ŝ 2 ;rÞ. Mardia and Marshall [35] proved the consistency and asymptotic normality of ϑ which leads to the asymptotic normality of the large-scale parameter vectorβ.

Efficient estimation strategies
Consider the following multiple linear regression model where � * N n (0, σ 2 I n ). The ordinary least square estimators of β, denoted byβ OLS is given by (X 0 X) −1 X 0 Y enjoys some good properties. However, when multicollinearity exits, the entries of (X 0 X) −1 become large, which cause a large variation ofβ OLS . To overcome the problem of multicollinearity, Hoerl and Kennard [20] proposed the ridge estimator which is given by: where k > 0, Note that if k = 0,β Ridge ¼β OLS , and if k ! 1,β Ridge ¼ 0. Later on, Liu [21] proposed a biased estimator to deal with multicollinearity, which benefits form both the ridge estimator and shrinkage estimator, it is denoted byβ LU , and given by: where 0 < d < 1, known as the biasing parameter. Obviously, when d = 1,β LU ¼β OLS . In the next subsection we introduce the Liu estimator for for the CA model.

Liu estimators for the CA model
Generally speaking, subjective information about the importance of a certain regression coefficients is available. Such information divides the p × 1 regression parameter vector as β = (β 1 , β 2 ), where β 1 , β 2 are of dimensions p 1 × 1 and p 2 × 1, respectively, with p = p 1 + p 2 . Also, the n × p design matrix is partitioned as X = (X 1 , X 2 ), where X 1 is an n × p 1 and X 2 an n × p 2 matrices. So, the model in (1) can be rewritten as: We are initially interested in estimating β 1 by removing β 2 when X 2 is insignificant to explain the variation in the response variable. Such information can be obtained either from some variable selection approaches or some uncertain prior information. In other words, we may consider testing a restriction given by: Assuming we obtained information about X 2 , then the candidate sub-model is given by: The MLE of β 1 for the previous model in (11) can be easily obtained in a similar manners as we gotβ in (2), and is given by: For the model in (9), the MLE of β 1 can be obtained by maximizing the log-likelihood given by ¼ 0, then solve the two equations to get: where , andβ 2 has the same formula by interchanging the indices 1 and 2. Note that,β 1 can be also written in terms ofβ 1 SM as follows: We define the Liu estimator of β 1 as follows: where 0 < d < 1. We will refer to the estimator inβ 1 LU in (15) as the full model estimator of β 1 . The Liu estimator of the sub-model in (11) is defined as follows: where 0 < d s < 1. In fact, under the null hypothesis in (10),β 1 LUS performs better thanβ 1 LU or when β 2 closes to 0, but when β 2 starts moving away from the null space,β 1 LUS becomes inefficient, whileβ 1 LU remains consistent.

The pretest and shrinkage Liu-type estimators
The pretest Liu-type estimator of β 1 depends on testing the null hypothesis in (10). It chooseŝ β 1 LU if the hypothesis is rejected at α−level of significance, andβ 1 LUS otherwise. It is denoted byβ 1 PTL and given by: where I(.) is the indicator function, L n is a suitable test statistics for testing H 0 in (10), and is given by It is denoted byβ 1 SL , and given by: However,β 1 SL may experience an over-shrinkage problem, and produce unexpected signs of some of coefficients when p 2 − 2 > L n . This issue was handled by the Liu-type positive shrinkage estimation of β 1 , which is defined as: where u + = max(0, u).

Asymptotic results
In this section, we study the asymptotic behaviour of the proposed estimators assuming a sequence of local alternatives {H (n) } given by: where ξ is a p 2 × 1 fixed and known vector. Clearly, if ξ = 0, the local alternatives in (20) reduces to (10). Let β * 1 be any of the proposed estimators of β 1 , and M be a p 1 × p 1 positive definite weight matrix. Define the cumulative distribution function ofθ * n ¼ ffi ffi ffi n p ðβ * 1 À β 1 Þ by FðxÞ ¼ lim n!1 P H ðnÞ ðθ * � xÞ, and the quadratic loss function of β * 1 as where tr(A) is the trace of the matrix A. Ifθ * n ! Dθ * , where ! D denotes to the convergence in distribution, then the asymptotic quadratic risk (AR) of β * 1 is defined as: The asymptotic joint normality of the sub and full models Liu estimators is the main tool in deriving the AR expressions, we list two theorems below to find these expressions. Assuming the assumptions of theorem 2 of Mardia and Marshall [35] and the following: iii. lim

and ! d denotes to the convergence in distribution.
Proof: Note that which is a linear combination ofβ. Hence, by Mardia and Marshall theorem [35], and as n ! 1, ffi ffi ffi n pβ LU À β À � converges in distribution to multivariate Gaussian distribution with: Mean = −(1 − d)(C−I p ) −1 β, and (20), and as n ! 1, we have:

Under the previous assumptions, the sequence of local alternatives in
, and λ 11:2 ¼ λ 1 À C 12 C À 1 22 ðβ 2 À ξÞ À λ 2 ð Þ is the conditional distribution mean of β 1 given β 2 ¼ 0 p 2 . The proof of Theorem (2) is similar to the proof of Theorem (1) with little modification. Also, we refer to Bahadır Y. et al [36] for a similar proof.

Asymptotic distributional and quadratic bias of the estimators
The asymptotic distributional bias expressions, denoted by ABðβ * 1 Þ, where β * 1 is any of the the prosed estimators, are given in the following theorem.
Theorem 4 Let y = (y 1 , y 2 , . . ., y q ) 0 be N q (μ, Σ), and let ϕ be any measurable function, then is the chi-square random variable with (n) degrees of freedom and D ¼ μ 0 μ 2 is the non-centrality parameter.

Asymptotic quadratic risk
The asymptotic quadratic risk (QR) can be used as a measure of relative performance with respect to the classical MLE of the full model estimator. To obtain the expressions of the QRs of the proposed estimators, we define the quadratic loss function as: whereβ * 1 as any of the proposed estimators, andβ 1 is the MLE of the model in (11). Also, the asymptotic covariance matrix (AC) ofβ * 1 is defined as: Finally, for any p 1 × p 1 positive definite matrix M, the QRðβ * 1 Þ is defined as: where tr(W) is the trace of the matrix W. To derive the QR expressions we use the following theorem Theorem 5 Let y = (y 1 , y 2 , . . ., y q ) 0 be N q (μ, Σ), and let ϕ be any measurable function, then The proof can be found in [37]. n IðL n � l n;a Þgjθ ð3Þ n g:
Analytical risk comparisons of the proposed estimators can be carried out based on QR expressions. However, our results are similar to those discussed by Al-Momani, M. et al [38] and Bahadir Y. et. al [36], so we relay on numerical comparisons to check the estimators performance.

Numerical study
In this section, we examine the performance of the proposed estimators numerically based on Monte Carlo Simulation experiments and real data example.

Monte Carlo Simulation
We conduct a Monte Carlo Simulation using square lattices of N × N with N = 7, 10 and corresponding sample sizes n = N 2 = 49, 100. The design matrix X is generated from multivariate Gaussian distribution with mean (0) and covariance matrix with first-order Autoregressive structure for the assessment of multicollinearity. That is, covðX i ; use ρ x = {0.3, 0.6, 0.9}. The error term � is generated from multivariate Gaussian distribution with CA covariance structure, so � * N(0, σ 2 (I n −ρW*) −1 D). We use σ 2 = 1, and employed a queen-based contiguity neighborhood for the matrix W*. The spatial dependence parameter ρ is chosen to vary over the set {−0.9, −0.5, 0, 0.5, 0.90}. The p × 1 parameter vector β is parti- vetoer of zeros, and Δ is the non-centrality parameter defined as Δ = kβ − β 0 k, where k.k is the Euclidian norm. The values of Δ are chosen to vary from 0 to 2. Obviously, when Δ = 0, the null hypothesis in (10) is true, and becomes false when Δ starts moving from the null space. The number of regression coefficients that form the vector β are (p 1 , p 2 ) 2 {(5, 10), (5,20), (5,30)}, and we use α = 0.05. To fit the full and sub CA models, we use the spdep R-package [39] and apply the function spautolm to the generated data. A 2000 Monte Carlo runs is repeated for each single case. In each of these runs, the full model, sub-model, pretest, shrinkage, and positive shrinkage Liu-type estimators of β 1 were computed, and the mean squared error (MSE) for all estimators obtained, then the simulated relative efficiencies(SRE) with respect to the full model MLE estimator (β 1 ) of (β 1 ) are calculated for all values of Δ using the following formula: where for any estimator for β 1 , say β � 1 , the MSEðβ � 1 Þ ¼ increases, its efficiency increases for fixed values of ρ and ρ x . Furthermore,β 1 LU efficiency increases as the multicollinearity becomes stronger among the explanatory variables within the design matrix.
2. When Δ = 0, the Liu-type sub-model estimator dominates all other estimators. It is expected as the null hypothesis is true. However, as Δ starts moving form the null space, the SRE of estimator decreases sharply, and the estimator becomes inefficient compared with rest of the estimators.
3. As the correlation coefficient ρ x increase among the explanatory variables, the SRE values are also increase holding other parameters fixed.

Boston housing data
Regarding the use of housing market information for census tracts in the Boston Standard Metropolitan Statistical Area in 1970, Harrison and Rubinfeld [40] looked at a number of practical concerns. Their main goal was to determine the correlation between a group of (15) variables and the median price of owner-occupied homes in Boston. A corrected version of the data set with additional spatial information were provided by Gilly and Pace [41]. The data set is available under the R-Packages MASS, spdep, the list of the variables as given in the package are as follows: • TRACT: Census tract id number.
• MEDV: Median value of owner-occupied homes in (1000's USD). • CHAS: A dummy variable with two levels, 1 if tract border to Charles river; 0 otherwise.
• NOX: Levels of nitrogen oxides concentration (parts per 10 million) per town.
• DIS: Weighted distance to five employment centers.
• RAD: An index of accessibility to radial highway per town (constant for all Boston tracts).
• LSTAT: Percentage of lower status population.
• TAX: Property tax rate per (USD 10,000) per town (constant for all Boston tracts).
• B: The variable B = 1000(b − 0.63) 2 , where b is the proportion of blacks .   Fig (4) shows a plot of the correlation coefficients among all variables in colors in which the a strong linear relation appears in dark colors, and as it becomes weak, the color changes to light or may disappear when no linear relation exists. The figure shows some strong linear relationship between the CMEDV and some other variables. As we do not have any prior information about the available covariates, we might apply any variable selection method. In our scenario, we employ the AIC/BIC selection criterion to produce a submodel.
The full which contains all available covariates, and the sub model obtained by the AIC/BIC selection are given above in Table (1). To evaluate the performance of the proposed estimators, we used a bootstrapping method suggest by Solow [42], and computed the mean squared prediction error (MSPE) using each estimator as below: 1. We use the spautolm function to fit the CA full model using all available variables as papered in Table (1) and obtain the maximum likelihood estimates of β, σ 2 , the spatial dependence parameter ρ, the matrix V n and the biasing parameter d using the formula suggested by Alheety et. al [43], which is given by: and we estimate d and V n by replacing σ 2 , ρ and β by their corresponding MLEs estimates, whereŝ 2 ¼ ðYÀ XβÞ 0 ðYÀ XβÞ nÀ p 2. Employ the Cholesky decomposition for the matrixV n to write it asV n ¼ÂÂ 0 , whereÂ is an (n × n) lower triangular matrix.
where K is the number of bootstrapping samples.
9. Compute the relative efficiency of the square root of the MSPE (REMSPE) as follows: whereβ � 1 is any of the proposed estimators, and we use K = 2000 bootstrapping samples.
A value of the REMSPE grater than one indicates the superiority of the estimator in the denominator.

Conclusion
In this paper, we proposed the pretest, shrinkage, and positive shrinkage estimators for the CA model's large-scale effects vector of parameters. We formulated a hypothesis of the form H 0 : β 2 = 0 to obtain the Liu estimator of the main effect β 1 under this UPI, and the submodel estimators. Then we combined these two estimators to get the Liu-type pretest, shrinkage and positive shrinkage estimators. Further, the set of estimators were compared analytically based on their asymptotic bias, quadratic bias, and risks, and provided related expressions. Also, these estimators were evaluated numerically via their relative performance using an expensive simulation experiments based on different values of the spatial dependence parameter (ρ) and difference lattice sizes (N), and applied the proposed estimators to a real data example. Our analytical and numerical results showed that the submodel estimator is superior whenever the restriction given by H 0 : β 2 = 0 is correct or nearly correct, that is when the UPI is true. However, when the restriction becomes false and the test statistics rejects the null hypothesis, the submodel estimator becomes inefficient, and had the highest MSE, while the Liu-type positive shrinkage estimator showed the highest performance compared with other estimators regardless of the accuracy of the UPI. For future research, the proposed estimation approach might be applied to different spatial regression models and investigate the performance of the proposed estimators analytically and numerically. Also, one more attractive area is the extension of the proposed estimation strategies to the high-dimensional data (HDD) case of the large-scale effect regression parameter vector of the CA model when (p > > n), and study the behavior of the Liu-type estimators. In addition, we can study the Liu-type estimation technique assuming a prior distribution for the CA model, and obtain the updated Liu, pretest, shrinkage, and positive shrinkage Liu-type estimators of β 1 .